Efficient stochastic algorithms for document clustering

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient stochastic algorithms for document clustering

Clustering has become an increasingly important and highly complicated research area for targeting useful and relevant information in modern application domains such as the World Wide Web. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm may generate a local o...

متن کامل

Algorithms for Soft Document Clustering

Aim of this paper is to highlight the possibilities of clustering algorithms called as the "soft clustering" algorithms. The traditional approach "hard clustering" allows us to include only one document cluster. Soft clustering algorithms, like Fuzzy C-means (FCM), Word Base Soft Clustering (WBSC) Similarity-Based Soft Clustering Algorithm (SISC) and Kondadadi and Kozma modified ART (KMART), al...

متن کامل

Efficient Ensemble Methods for Document Clustering

Recent ensemble clustering techniques have been shown to be effective in improving the accuracy and stability of standard clustering algorithms. However, an inherent drawback of these techniques is the computational cost of generating and combining multiple clusterings of the data. In this paper, we present an efficient kernel-based ensemble clustering method suitable for application to large, ...

متن کامل

Efficient Prediction-Based Validation for Document Clustering

Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational cost of generating and assessing multiple clusterings of the data. In this paper we present an efficient prediction-based validation approach suitable for application to large, high-dimensional datasets such as text co...

متن کامل

Space-Efficient Algorithms for Document Retrieval

We study the Document Listing problem, where a collection D of documents d1, . . . , dk of total length ∑ i di = n is to be preprocessed, so that one can later efficiently list all the ndoc documents containing a given query pattern P of length m as a substring. Muthukrishnan (SODA 2002) gave an optimal solution to the problem; with O(n) time preprocessing, one can answer the queries in O(m+ndo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2013

ISSN: 0020-0255

DOI: 10.1016/j.ins.2012.07.025